Index replication using crawl modification information

ABSTRACT

Systems, methodologies, media, and other embodiments associated with index replication using crawl modification information are described. One exemplary system embodiment includes an enterprise search system comprising a target search system comprising an index logic that uses modified crawl information related to items associated with sources to maintain an index that supports searching of the items; and, a crawl search system comprising a pipeline processor configured to receive modified crawl information related to the items and to propagate the modified crawl information to the target system.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims the benefit of U.S. Provisional PatentApplication Ser. No. 60/777,988 filed Mar. 1, 2006, titled “Systems andMethods For Searching”, attorney docket number 27252-86-PRO.

This application also claims the benefit of U.S. Provisional PatentApplication Ser. No. 60/853,487 filed Oct. 20, 2006, titled “IndexReplication Using Crawl Modification Information”, inventorsKrishnaprasad et at., and attorney docket number 27252-94-PRO.

BACKGROUND

An enterprise may have a variety of data having a variety of formats.This disparate data may be stored in a number of locations. For example,emails may be stored in email servers and on user desktop systems.Similarly, calendar information may be stored in a calendar server andon user desktop systems. Items (e.g., word processing files,spreadsheets, presentations, web pages) may be stored in differentlocations distributed throughout the enterprise. An index/search systemcan facilitate locating and retrieving relevant items of the enterprise.

BRIEF DESCRIPTION OF THE DRAWINGS

The accompanying drawings, which are incorporated in and constitute apart of the specification, illustrate various example systems, methods,and other example embodiments of various aspects of the invention. Itwill be appreciated that the illustrated element boundaries (e.g.,boxes, groups of boxes, or other shapes) in the figures represent oneexample of the boundaries. One of ordinary skill in the art willappreciate that one element may be designed as multiple elements or thatmultiple elements may be designed as one element. An element shown as aninternal component of another element may be implemented as an externalcomponent and vice versa. Furthermore, elements may not be drawn toscale.

FIG. 1 illustrates an example enterprise search system.

FIG. 2 illustrates another example enterprise search system.

FIG. 3 illustrates an example distributed enterprise crawl system.

FIG. 4 illustrates an example method for replicating search information.

FIG. 5 illustrates an example method for replicating search information.

FIG. 6 illustrate an example method for replicating search information.

FIG. 7 illustrates an example method for replicating search informationin a distributed crawl environment.

FIG. 8 illustrates an example computing environment in which examplesystems and methods illustrated herein can operate.

DETAILED DESCRIPTION

Example systems, methods, computer-readable media, software and otherembodiments are described herein that relate to replication and/or highavailability of search information. In one embodiment, a crawl searchsystem can crawl items, identify modification(s) to the crawled itemsand provide information regarding the modification(s) to one or moretarget search systems. The target search system(s) can use theinformation regarding the modification(s) to independently update theirassociated index. In this manner, the target search system(s) do notseparately crawl the items—the target search system(s) process theinformation regarding modification(s) received from the crawl searchsystem.

The following includes definitions of selected terms employed herein.The definitions include various examples and/or forms of components thatfall within the scope of a term and that may be used for implementation.The examples are not intended to be limiting. Both singular and pluralforms of terms may be within the definitions.

As used in this application, the term “computer component” refers to acomputer-related entity, either hardware, firmware, software, acombination thereof, or software in execution. For example, a computercomponent can be, but is not limited to being, a process running on aprocessor, a processor, an object, an executable, a thread of execution,a program, and a computer. By way of illustration, both an applicationrunning on a server and the server can be computer components. One ormore computer components can reside within a process and/or thread ofexecution and a computer component can be localized on one computerand/or distributed between two or more computers.

“Computer communication”, as used herein, refers to a communicationbetween two or more computing devices (e.g., computer, personal digitalassistant, cellular telephone) and can be, for example, a networktransfer, a file transfer, an applet transfer, an email, a hypertexttransfer protocol (HTTP) transfer, and so on. A computer communicationcan occur across, for example, a wireless system (e.g., IEEE 802.11), anEthernet system (e.g., IEEE 802.3), a token ring system (e.g., IEEE802.5), a local area network (LAN), a wide area network (WAN), apoint-to-point system, a circuit switching system, a packet switchingsystem, and so on.

“Computer-readable medium”, as used herein, refers to a medium thatparticipates in directly or indirectly providing signals, instructionsand/or data. A computer-readable medium may take forms, including, butnot limited to, non-volatile media, volatile media, and transmissionmedia. Non-volatile media may include, for example, optical or magneticdisks and so on. Volatile media may include, for example, semiconductormemories, dynamic memory and the like. Transmission media may includecoaxial cables, copper wire, fiber optic cables, and the like.Transmission media can also take the form of electromagnetic radiation,like that generated during radio-wave and infra-red data communications,or take the form of one or more groups of signals. Common forms of acomputer-readable medium include, but are not limited to, a floppy disk,a flexible disk, a hard disk, a magnetic tape, other magnetic medium, aCD-ROM, other optical medium, punch cards, paper tape, other physicalmedium with patterns of holes, a RAM, a ROM, an EPROM, a FLASH-EPROM, orother memory chip or card, a memory stick, a carrier wave/pulse, andother media from which a computer, a processor or other electronicdevice can read. Signals used to propagate instructions or othersoftware over a network, like the Internet, can be considered a“computer-readable medium.”

“Data store”, as used herein, refers to a physical and/or logical entitythat can store data. A data store may be, for example, a database, atable, a file, a list, a queue, a heap, a memory, a register, and so on.A data store may reside in one logical and/or physical entity and/or maybe distributed between two or more logical and/or physical entities.

“Logic”, as used herein, includes but is not limited to hardware,firmware, software and/or combinations of each to perform a function(s)or an action(s), and/or to cause a function or action from anotherlogic, method, and/or system. For example, based on a desiredapplication or needs, logic may include a software controlledmicroprocessor, discrete logic like an application specific integratedcircuit (ASIC), an analog circuit, a digital circuit, a programmed logicdevice, a memory device containing instructions, or the like. Logic mayinclude one or more gates, combinations of gates, or other circuitcomponents. Logic may also be fully embodied as software. Where multiplelogical logics are described, it may be possible to incorporate themultiple logical logics into one physical logic. Similarly, where asingle logical logic is described, it may be possible to distribute thatsingle logical logic between multiple physical logics.

An “operable connection”, or a connection by which entities are“operably connected”, is one in which signals, physical communications,and/or logical communications may be sent and/or received. Typically, anoperable connection includes a physical interface, an electricalinterface, and/or a data interface, but it is to be noted that anoperable connection may include differing combinations of these or othertypes of connections sufficient to allow operable control. For example,two entities can be operably connected by being able to communicatesignals to each other directly or through one or more intermediateentities like a processor, operating system, a logic, software, or otherentity. Logical and/or physical communication channels can be used tocreate an operable connection.

“Query”, as used herein, refers to a semantic construction thatfacilitates gathering and processing information. A query might beformulated in a database query language like structured query language(SQL) or object query language (OQL). A query might be implemented incomputer code (e.g., C#, C++, JavaScript) that can be employed to gatherinformation from various data stores and/or information sources.

“Signal”, as used herein, includes but is not limited to one or moreelectrical or optical signals, analog or digital signals, data, one ormore computer or processor instructions, messages, a bit or bit stream,or other means that can be received, transmitted and/or detected.

“Software”, as used herein, includes but is not limited to, one or morecomputer or processor instructions that can be read, interpreted,compiled, and/or executed and that cause a computer, processor, or otherelectronic device to perform functions, actions and/or behave in adesired manner. The instructions may be embodied in various forms likeroutines, algorithms, modules, methods, threads, and/or programsincluding separate applications or code from dynamically linkedlibraries. Software may also be implemented in a variety of executableand/or loadable forms including, but not limited to, a stand-aloneprogram, a function call (local and/or remote), a servelet, an applet,instructions stored in a memory, part of an operating system or othertypes of executable instructions. It will be appreciated by one ofordinary skill in the art that the form of software may be dependent on,for example, requirements of a desired application, the environment inwhich it runs, and/or the desires of a designer/programmer or the like.It will also be appreciated that computer-readable and/or executableinstructions can be located in one logic and/or distributed between twoor more communicating, co-operating, and/or parallel processing logicsand thus can be loaded and/or executed in serial, parallel, massivelyparallel and other manners.

Suitable software for implementing the various components of the examplesystems and methods described herein include programming languages andtools like Java, Pascal, C#, C++, C, CGI, Perl, SQL, APIs, SDKs,assembly, firmware, microcode, and/or other languages and tools.Software, whether an entire system or a component of a system, may beembodied as an article of manufacture and maintained or provided as partof a computer-readable medium as defined previously. Another form of thesoftware may include signals that transmit program code of the softwareto a recipient over a network or other communication medium. Thus, inone example, a computer-readable medium has a form of signals thatrepresent the software/firmware as it is downloaded from a web server toa user. In another example, the computer-readable medium has a form ofthe software/firmware as it is maintained on the web server. Other formsmay also be used.

“User”, as used herein, includes but is not limited to one or morepersons, software, computers or other devices, or combinations of these.

Some portions of the detailed descriptions that follow are presented interms of algorithms and symbolic representations of operations on databits within a memory. These algorithmic descriptions and representationsare the means used by those skilled in the art to convey the substanceof their work to others. An algorithm is here, and generally, conceivedto be a sequence of operations that produce a result. The operations mayinclude physical manipulations of physical quantities. Usually, thoughnot necessarily, the physical quantities take the form of electrical ormagnetic signals capable of being stored, transferred, combined,compared, and otherwise manipulated in a logic and the like.

It has proven convenient at times, principally for reasons of commonusage, to refer to these signals as bits, values, elements, symbols,characters, terms, numbers, or the like. It should be borne in mind,however, that these and similar terms are to be associated with theappropriate physical quantities and are merely convenient labels appliedto these quantities. Unless specifically stated otherwise, it isappreciated that throughout the description, terms like processing,computing, calculating, determining, displaying, or the like, refer toactions and processes of a computer system, logic, processor, or similarelectronic device that manipulates and transforms data represented asphysical (electronic) quantities.

Information provided by a crawling system may be indexed, for example,by a search system to which the crawling system provides theinformation. Queries to locate relevant documents may thus interact withthe index rather than trying to perform their own search.

FIG. 1 illustrates an enterprise search system 100 that can be employed,for example, to facilitate replication and/or high availability ofenterprise search information. The enterprise search system 100 includesa crawl search system 110 and one or more target search system(s) 120that replicate a primary index 150. In one embodiment, the target searchsystem(s) 120 can replicate the primary index 150 without re-crawlingsources.

In order to achieve replication of index information to facilitate highavailability of search information, logically and/or physically separateindexes can be maintained, for example, a primary index and one or morereplicated indexes. Conventionally, each search system separatelycrawled sources to obtain information regarding items (e.g., documents,files, web pages, emails, spread sheets, databases etc.) of anenterprise. The search system performed additional annotations such asmetadata extraction and/or filtering.

The information obtained was then indexed for use during search. Forexample, an index can organize content, metadata, security information,and so on to support queries that search for documents and/or content.Rather than having to search the entire enterprise with respect to eachquery, relevant results can be identified through the index.

Thus, with conventional systems, redundant information was obtained,annotated, indexed and maintained in order to facilitate highavailability of search information. In the event that a primary indexfailed, one or more redundant indexes were available to facilitatesearching. However, the redundant crawling of sources by each searchsystem can be unduly burdensome for the sources, for example, forcausing an undue load on the sources. Additionally, maintainingsynchronization between the conventional search systems can beproblematic.

With the enterprise search system 100, a crawler logic 130 can crawlitems associated with sources 140 a-140 n (collectively “the sources140”) and identify modification(s) to items associated with the sources140. A source can include a datastore/repository, a database, a website,or other type of information source that can be crawled. The crawlerlogic 130 can store information regarding modification(s) to the itemsassociated with the sources 140 in the primary index 150. While depictedin FIG. 1 as physically separate from the crawl search system 110, inone example, the crawler logic 130 is a component of the crawl searchsystem 110.

For example, the crawler logic 130 can be configured to access itemsstored on different sources 240 belonging to an enterprise. The itemsmay have different document types, different security settings, and soon. The crawler logic 130 can detect whether an item or informationassociated with a item has changed since a previous crawl.

For example, crawler logic 130 can identify changes to a content, adocument metadata, and a document security information (e.g., accesscontrol list ACL). Additionally, the crawler logic 130 can identifychanges to an Access Control List Identifier (ACL-ID), an owner globallyunique identifier (GUID) and so on. The crawler logic 130 canselectively mark a document for re-indexing if there has been a change.The indexing may include organizing content and accessible userinformation (e.g., security settings), which can then be used to supportsecure queries.

Additionally, the crawler logic 130 can provide information regardingthe modification(s) to the sources 140 to a pipeline processor 160 ofthe crawl search system 110. The pipeline processor 160 can provide(e.g., propagate) information regarding the modification(s) to thesources 140 to one or more target search systems 120.

The target search systems 120 can include an index logic 170 thatreceives the information regarding the modification(s) from the pipelineprocessor 160. Accordingly, the target search systems 120 do notseparately crawl the sources 140—the target search systems 120 processthe information regarding modification(s) received from the crawl searchsystem 110.

The index logic 170 can then independently update an index 180associated with the particular target search system 120. Similar to theprimary index 150, the index 180 of the target search system 120 canorganize content, metadata, security information, and so on to supportqueries that search for documents and/or content. The primary index 150and the index 180 can employ similar and/or different storage mechanisms(e.g., protocols).

In one example, the crawler logic 130 can include a set of crawlers,which can touch (e.g., locate, examine, retrieve from) many sources.Further, the crawler logic 130 can retrieve data (e.g., content),metadata (e.g., URL, a content type, a crawl depth, a language code, anattribute count, an attribute list, an owner GUID, a source hierarchy,title, type, creation date, modification date and so on), and/orsecurity information (e.g., access control list (ACL)) associated withan item. This information can be normalized so that similar informationconcerning an email, a calendar entry, a web page, and so on, can beprocessed in a consistent and/or uniform manner. Normalized data mayinclude, for example, a first paragraph of content, a keyword(s)extracted from content, author information, creation information,modification information, security information, and so on.

FIG. 2 illustrates an enterprise search system 200 that can be employed,for example, to facilitate replication and/or high availability ofenterprise search information. The enterprise search system 200 includesa crawl search system 210 and a plurality of target search systems 220a-220 m. As discussed in greater detail below, individually and/orcollectively, the target search systems 220 can replicate a primaryindex 250 without re-crawling sources 240.

In one example, the enterprise search system 200 includes a crawlerlogic 230 that can crawl items associated with sources 240 a-240 p(collectively “the sources 240”) and identify modification(s) to itemsassociated with the sources 240. The sources 240 represent similar typesof sources as the sources 140 from FIG. 1. The crawler logic 230 canstore information regarding modification(s) to the items associated withthe sources 240 in the primary index 250. In another example, thecrawler logic 230 is physically and/or logically separate from the crawlsearch system 210.

The crawler logic 130 can be configured to access items stored ondifferent sources 240 belonging to an enterprise. The items can havedifferent document types, different security settings, and so on. Thecrawler logic 230 can detect whether an item or information associatedwith a item has changed since a previous crawl.

The crawler logic 230 can provide information regarding themodification(s) to the sources 240 to a pipeline processor 260 of thecrawl search system 210. The pipeline processor 260 can provide (e.g.,propagate) information regarding the modification(s) to the sources 240to one or more target search systems 220. In this example, the targetsearch systems 220 collectively can replicate the primary index 250without re-crawling the sources 240. Thus, each target search system 220a-220 m can obtain crawl information without its own dedicated crawlerthat would otherwise duplicate the crawls of other dedicated crawlers.

In one example, each of the target search systems 220 receivessubstantially all of the modification information from the pipelineprocessor 260. Thus, each target search system 200 can independentlyserve as a backup to the crawl search system 210.

In another example, indexing provided by the target search systems 220are partitioned, for example, temporally, by logical grouping (e.g.,author, enterprise business group etc.), by type of item (e.g., wordprocessing, email etc.), and the like. The pipeline processor 260 canthen provide modification information only to the appropriate targetsearch system(s) 220. Alternatively, the pipeline processor 260 canbroadcast substantially all of the modification information to each ofthe target search systems 220 with only the appropriate individualtarget search system(s) 220 processing the modification information.

For example, a first target search system 220 a can be dedicated toindexing word processing items and a second target search system 220 mcan be dedicated to indexing spreadsheets. When the pipeline processor260 receives modification information regarding a word process item,that information can be forwarded to the first target search system 220a and not to the second target search system 220 m. Alternatively, themodification information can be sent to some or all the target searchsystems 220 a-m with the appropriate target search system 220 processingthe modification information that is relevant to it (e.g., theinformation is filtered by the receiving target search system).

In another example, items associated with a particular source can beprovided to a particular target search system 220 a. In yet anotherexample, items associated with a particular entity (e.g., author,enterprise division etc.) can be provided to a particular target searchsystem 220 a.

The target search systems 220 can include an index logic 270 thatreceives the information regarding the modification(s) from the pipelineprocessor 260. Accordingly, the target search systems 220 do notseparately crawl the sources 240 but rather share the crawledinformation and process the modifications received from the crawl searchsystem 210. Thus, redundant crawling of sources 240 can be reduced oreliminated by sharing/distributing crawled information between thetarget search systems 220 a-220 m.

The index logic 270 can then independently update an index 280associated with the particular target search system 220. The index 280of the target search system 220 can organize content, metadata, securityinformation, and so on to support queries that search for documentsand/or content.

FIG. 3 illustrates a distributed enterprise crawl system 300 that can beemployed, for example, to facilitate replication and/or highavailability of enterprise search information. The enterprise searchsystem 300 includes a first search system 310 a and a second searchsystem 310 b.

Each search system 310 includes a crawler logic 320, an index 330, apipeline processor 340 and an index logic 350. The crawler logic 320 candetect whether an item or information associated with a item has changedsince a previous crawl. The crawler logic 320 can be configured toaccess items stored in particular source(s) 360 (e.g. 360 a, 360 b . . .) and/or portions of source(s) 360 (e.g. 360, 360 b . . . ) belonging toan enterprise. For example, the first search system 310 a can beconfigured to crawl a first source 360 a while a second search system310 b can be configured to crawl a second source 360 b.

The crawler logic 320 a of the first search system 310 a can provideinformation regarding the modification(s) to the first source 360 a tothe pipeline processor 340 a of the first search system 320 a. Thepipeline processor 340 a of the first search system 320 a can provide(e.g., propagate) information regarding the modification(s) to the firstsource 360 a to the index logic 350 b of the second search system 310 b.The index logic 350 b of the second search system 310 b can thenindependently update the index 330 b of the second search system 310 b.

Example methods may be better appreciated with reference to flowdiagrams. While for purposes of simplicity of explanation, theillustrated methodologies are shown and described as a series of blocks,it is to be appreciated that the methodologies are not limited by theorder of the blocks, as some blocks can occur in different orders and/orconcurrently with other blocks from that shown and described. Moreover,less than all the illustrated blocks may be required to implement anexample methodology. Blocks may be combined or separated into multiplecomponents. Furthermore, additional and/or alternative methodologies canemploy additional, not illustrated blocks. While the figures illustratevarious actions occurring in serial, it is to be appreciated thatvarious actions could occur concurrently, substantially in parallel,and/or at substantially different points in time.

Illustrated in FIG. 4 is an example methodology 400 for replicatingsearch information. The illustrated elements denote “processing blocks”that may be implemented in logic. In one example, the processing blocksmay represent executable instructions that cause a computer, processor,and/or logic device to respond, to perform an action(s), to changestates, and/or to make decisions. Thus, the described methodologies canbe implemented as processor executable instructions and/or operationsprovided by a computer-readable medium. In another example, theprocessing blocks may represent functions and/or actions performed byfunctionally equivalent circuits such as an analog circuit, a digitalsignal processor circuit, an application specific integrated circuit(ASIC), or other logic device. The diagram of FIG. 4, as well as theother illustrated diagrams, are not intended to limit the implementationof the described examples. Rather, the diagrams illustrate functionalinformation one skilled in the art could use to design/fabricatecircuits, generate software, or use a combination of hardware andsoftware to perform the illustrated processing.

It will be appreciated that electronic and software applications mayinvolve dynamic and flexible processes such that the illustrated blockscan be performed in other sequences different than the one shown and/orblocks may be combined or separated into multiple components. Blocks mayalso be performed concurrently, substantially in parallel, and/or atsubstantially different points in time. They may also be implementedusing various programming approaches such as machine language,procedural, object oriented and/or artificial intelligence techniques. Amethod can be implemented by processor executable instructions providedby a computer-readable medium that when the instructions are executedcause a computing system to perform the method. The foregoing applies toall methodologies described herein.

FIG. 4 illustrates a method 400 for replicating search information. At410, items associated with sources are crawled, for example, by acrawler logic 130, 230, 320. At 420, changes to the items are identified(e.g., by the crawler logic 130, 230, 320).

At 430, information regarding identified changes is broadcast to one ormore target search systems (e.g., target search system 120, 220), forexample, by a pipeline processor 160, 260. At 440, each target searchsystem independently updates its associated index and the method 400ends.

While FIG. 4 illustrates various actions occurring in serial, it is tobe appreciated that various actions illustrated in FIG. 4 could occursubstantially in parallel. By way of illustration, a first process couldcrawl items associated with sources. Similarly, a second process couldidentify changes to the items, while a third process could broadcastinformation regarding identified changes to target search systems. Whilethree processes are described, it is to be appreciated that a greaterand/or lesser number of processes could be employed and that lightweightprocesses, regular processes, threads, and other approaches could beemployed.

FIG. 5 illustrates a method 500 for replicating search information. At510, items associated with sources are crawled. At 520, changes to theitems are identified.

At 520, information regarding identified changes is provided toparticular target search system(s). At 530, the particular target searchsystem(s) independently update their associated indexes, and, the method500 ends.

FIG. 6 illustrates a method 600 for replicating search information. At610, items associated with sources are crawled. At 620, changes to theitems are identified.

At 630, information regarding identified changes is propagated to targetsearch systems (e.g., substantially all target search systems). At 640,particular target search system(s) independently update their associatedindexes, and, the method 600 ends.

FIG. 7 illustrates a method 700 for replicating search information in adistributed crawl environment. At 710, each crawler of a plurality ofsearch systems identifies changes to items for particular source(s)(e.g. sources that each crawler is assigned to search or is responsiblefor searching). At 720, each crawler broadcasts identified changes to atleast some of the other plurality of search systems. At 730, each searchsystem can collect the crawled information from the different crawlersand independently update its associated index for the plurality ofsources, and, the method 700 ends.

In another embodiment, crawling assignments for crawling a plurality ofinformation sources can be distributed between the plurality ofcrawlers. Each of the crawlers can then crawl their assigned sources andidentify changes that have occurred to items within their assignedinformation sources. The identified changes are broadcasted/provided bythe crawlers to the plurality of search systems. With the collectedchanges, each search system can independently update an index to theplurality of information sources using the identified changes.

In one example, methodologies are implemented as processor executableinstructions and/or operations stored on a computer-readable medium.Thus, in one example, a computer-readable medium may store processorexecutable instructions operable to perform a method 400, 500, 600 forreplicating search information. While the above method is describedbeing stored on a computer-readable medium, it is to be appreciated thatother example methods described herein can also be stored on acomputer-readable medium.

FIG. 8 illustrates an example computing device in which example systemsand methods described herein, and equivalents, can operate. The examplecomputing device may be a computer 800 that includes a processor 802, amemory 804, and input/output ports 810 operably connected by a bus 808.In one example, the computer 800 may include a crawl search system 830configured to facilitate propagation of modified crawl information totarget search system(s). The crawl search system 830 can be implementedsimilar to the crawl search system 110, 210 described in FIGS. 1 and 2,respectively, and/or the other systems and methods described herein.

Generally describing an example configuration of the computer 800, theprocessor 802 can be a variety of various processors including dualmicroprocessor and other multi-processor architectures. The memory 804can include volatile memory and/or non-volatile memory. The non-volatilememory can include, but is not limited to, ROM, PROM, EPROM, EEPROM, andthe like. Volatile memory can include, for example, RAM, synchronous RAM(SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rateSDRAM (DDR SDRAM), and direct RAM bus RAM (DRRAM).

A disk 806 may be operably connected to the computer 800 via, forexample, an input/output interface (e.g., card, device) 818 and aninput/output port 810. The disk 806 can include, but is not limited to,devices like a magnetic disk drive, a solid state disk drive, a floppydisk drive, a tape drive, a Zip drive, a flash memory card, and/or amemory stick. Furthermore, the disk 806 can include optical drives likea CD-ROM, a CD recordable drive (CD-R drive), a CD rewriteable drive(CD-RW drive), and/or a digital video ROM drive (DVD ROM). The memory804 can store processes 814 and/or data 816, for example. The disk 806and/or memory 804 can store an operating system that controls andallocates resources of the computer 800.

The bus 808 can be a single internal bus interconnect architectureand/or other bus or mesh architectures. While a single bus isillustrated, it is to be appreciated that computer 800 may communicatewith various devices, logics, and peripherals using other busses thatare not illustrated (e.g., PCIE, SATA, Infiniband, 1394, USB, Ethernet).The bus 808 can be of a variety of types including, but not limited to,a memory bus or memory controller, a peripheral bus or external bus, acrossbar switch, and/or a local bus. The local bus can be of varietiesincluding, but not limited to, an industrial standard architecture (ISA)bus, a microchannel architecture (MSA) bus, an extended ISA (EISA) bus,a peripheral component interconnect (PCI) bus, a universal serial (USB)bus, and a small computer systems interface (SCSI) bus.

The computer 800 may interact with input/output devices via i/ointerfaces 818 and input/output ports 810. Input/output devices caninclude, but are not limited to, a keyboard, a microphone, a pointingand selection device, cameras, video cards, displays, disk 806, networkdevices 820, and the like. The input/output ports 810 can include butare not limited to, serial ports, parallel ports, and USB ports.

The computer 800 can operate in a network environment and thus may beconnected to network devices 820 via the i/o devices 818, and/or the i/oports 810. Through the network devices 820, the computer 800 mayinteract with a network. Through the network, the computer 800 may belogically connected to remote computers. The networks with which thecomputer 800 may interact include, but are not limited to, a local areanetwork (LAN), a wide area network (WAN), and other networks. Thenetwork devices 820 can connect to LAN technologies including, but notlimited to, fiber distributed data interface (FDDI), copper distributeddata interface (CDDI), Ethernet (IEEE 802.3), token ring (IEEE 802.5),wireless computer communication (IEEE 802.11), Bluetooth (IEEE802.15.1), and the like. Similarly, the network devices 820 can connectto WAN technologies including, but not limited to, point to point links,circuit switching networks like integrated services digital networks(ISDN), packet switching networks, and digital subscriber lines (DSL).

While example systems, methods, and so on have been illustrated bydescribing examples, and while the examples have been described inconsiderable detail, it is not the intention of the applicants torestrict or in any way limit the scope of the appended claims to suchdetail. It is, of course, not possible to describe every conceivablecombination of components or methodologies for purposes of describingthe systems, methods, and so on described herein. Additional advantagesand modifications will readily appear to those skilled in the art.Therefore, the invention is not limited to the specific details, therepresentative apparatus, and illustrative examples shown and described.Thus, this application is intended to embrace alterations,modifications, and variations that fall within the scope of the appendedclaims. Furthermore, the preceding description is not meant to limit thescope of the invention. Rather, the scope of the invention is to bedetermined by the appended claims and their equivalents.

To the extent that the term “includes” or “including” is employed in thedetailed description or the claims, it is intended to be inclusive in amanner similar to the term “comprising” as that term is interpreted whenemployed as a transitional word in a claim. Furthermore, to the extentthat the term “or” is employed in the detailed description or claims(e.g., A or B) it is intended to mean “A or B or both”. When theapplicants intend to indicate “only A or B but not both” then the term“only A or B but not both” will be employed. Thus, use of the term “or”herein is the inclusive, and not the exclusive use. See, Bryan A.Garner, A Dictionary of Modern Legal Usage 624 (2 d. Ed. 1995).

1. An enterprise search system, comprising: a target search systemcomprising an index logic that uses modified crawl information relatedto items associated with sources to maintain an index that supportssearching of the items; and, a crawl search system comprising a pipelineprocessor configured to receive modified crawl information related tothe items and to propagate the modified crawl information to the targetsystem.
 2. The enterprise search system of claim 1, where the indexfunctionally replicates a primary index of the crawl search system. 3.The enterprise search system of claim 2, where the index and the primaryindex employ different storage mechanisms.
 4. The enterprise searchsystem of claim 1, further comprising a plurality of target searchsystems, where the index of each target system independentlyfunctionally replicates a primary index of the crawl search system. 5.The enterprise search system of claim 1, further comprising a pluralityof target search systems, where the indexes of the target systemscollectively functionally replicate a primary index of the crawl searchsystem.
 6. The enterprise search system of claim 1, where the itemscomprise at least one of documents, files, web pages, emails spreadsheets and/or databases.
 7. The enterprise search system of claim 1,where the modified crawl information comprising at least one of modifiedcontent, metadata and/or security information.
 8. The enterprise searchsystem of claim 1, further comprising a crawler logic configured toaccess the items to determine a modification to the items, the crawlerlogic further configured to provide modified crawl information to thepipeline processor.
 9. An enterprise search system, comprising: aplurality of target search systems, each target search system comprisingan index logic that uses modified crawl information related to itemsassociated with sources to maintain an index that supports searching ofthe items; a crawl search system comprising: a pipeline processorconfigured to receive modified crawl information related to the itemsand to propagate the modified crawl information to the plurality oftarget systems; and, a crawler logic configured to access the items todetermine a modification to the items, the crawler logic furtherconfigured to provide modified crawl information to the pipelineprocessor.
 10. The enterprise search system of claim 9, where each ofthe plurality of target search systems is configured to process only aparticular type of item.
 11. The enterprise search system of claim 9,where each of the plurality of target search systems is configured toprocess only items associated with a particular source.
 12. Theenterprise search system of claim 9, where each of the plurality oftarget search systems is configured to process only items associatedwith one or more entities.
 13. The enterprise search system of claim 9,where the index of each target system independently functionallyreplicates a primary index of the crawl search system.
 14. Theenterprise search system of claim 9, where the indexes of the targetsystems collectively functionally replicate a primary index of the crawlsearch system.
 15. A distributed enterprise crawl system, comprising: afirst search system and a second search system, each search systemcomprising: an index logic that uses modified crawl information relatedto items associated with sources to maintain an index that supportssearching of the items; a pipeline processor configured to receivemodified crawl information related to the items and to propagate themodified crawl information to the other search system; and, a crawlerlogic configured to access the items to determine a modification to theitems, the crawler logic further configured to provide modified crawlinformation to the pipeline processor and to the index of the particularsearch system, where the first search system is configured to accessitems associated with a first source and the second search system isconfigured to access items associated with a second source.
 16. Thedistributed enterprise crawl system of claim 15, where the index of eachsearch system independently functionally replicates the index of theother search system.
 17. A method for replicating search information,comprising: crawling items associate with sources; identifying changesto the items; broadcasting information regarding the identified changesto a plurality of target search systems; and, independently updating anindex associated with each target search system based on the identifiedchanges.
 18. The method of claim 17, where each of the plurality oftarget search systems is configured to process only a particular type ofitem.
 19. The method of claim 17, where each of the plurality of targetsearch systems is configured to process only items associated with aparticular source.
 20. The method of claim 17, where each of theplurality of target search systems is configured to process only itemsassociated with one or more entities.
 21. The method of claim 17, wherethe index of each target system independently functionally replicates aprimary index of a crawl search system.
 22. The method of claim 17,where the indexes of the target systems collectively functionallyreplicate a primary index of a crawl search system.
 23. Acomputer-readable medium for providing processor executable instructionsthat when executed cause a computer to perform a method for replicatingsearch information in a computing system comprising a plurality ofsearch systems and a plurality of crawlers, the method comprising:distributing crawling assignments of a plurality of information sourcesbetween the plurality of crawlers; crawling and identifying, by each ofthe crawlers, changes to items from one or more assigned informationsources; broadcasting, by each of the crawlers, the identified changesto the plurality of search systems; and independently updating, by eachof the search systems, an index to the plurality of information sourcesusing the identified changes broadcasted from each of the crawlers.